Free-energy Based Scoring for Speech Recognition
نویسندگان
چکیده
Traditionally, speech recognizers have used a strictly Bayesian paradigm for finding the best hypothesis from amongst all possible hypotheses for the data at hand that is to be recognized. In fact, the Bayes classification rule has been shown to be optimal when the class distributions represent the true distributions of the data to be classified. In reality, however, this condition is not satisfied the classifer itself is trained on some training data and may be deployed to recognize data that are different from the training data. The use of Enropy as an optimization criterion for various classification tasks has been well established in the literature. In our work, we show that free-energy, a thermodynamic concept directly related to enropy, can also be used as an objective criteion in classification. Furthermore, we show how this novel classification scheme can be used in the framework of existing Bayesian classification schemes implemented in current recognizers by simply modifying the class distributions a-priori. Pilot experiments performed under mismatched and matched conditions further bring out the viability for free-enrgy for classification in speech recognizers.
منابع مشابه
Classification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملPronunciation Scoring for the Hearing-Impaired
Automatic assessment of articulation and prosody is an important aid for speech theraphy and language education. In this paper, we focus on speech theraphy for hearing-impairment and propose methods for automatic articulation and prosody scoring. The pronunciation problems of the hearing-impaired are briefly discussed. Three methods are developed for automatic pronunciation scoring for the hear...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملIntegration of multiple feature sets for reducing ambiguity in automatic speech recognition
This thesis presents a method to investigate the extent to which articulatory based acoustic features can be exploited to reduce ambiguity in automatic speech recognition search. The method proposed is based on a lattice re-scoring paradigm implemented to integrate articulatory based features into automatic speech recognition systems. Time delay neural networks are trained as feature detectors ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012